Chapter 7 



Optimization and Minimum 
Principles 



7.1 Two Fundamental Examples 



Within the universe of apphed mathematics, optimization is often a world of its own. 
There are occasional expeditions to other worlds (like differential equations), but 
mostly the life of optimizers is self-contained: Find the minimum of F{xi, . . . , Xn) 
That is not an easy problem, especially when there are many variables Xj and many 
constraints on those variables. Those constraints may require Ax = b or Xj > or 
both or worse. Whole books and courses and software packages are dedicated to this 
problem of constrained minimization. 

I hope you will forgive the poetry of worlds and universes. I am trying to emphasize 
the importance of optimization — a key component of engineering mathematics and 
scientific computing. This chapter will have its own flavor, but it is strongly connected 
to the rest of this book. To make those connections, I want to begin with two specific 
examples. If you read even just this section, you will see the connections. 



Ordinary least squares begins with a matrix A whose n columns are independent. 
The rank is n, so A'^A is symmetric positive definite. The input vector b has m 
components, the output u has n components, and m > n: 



Those equations A'^Au = A'^b say that the error residual e = b — Au solves A^e = 0. 
Then e is perpendicular to columns 1,2, ... ,n of A. Write those zero inner products 



Least Squares 



Least squares problem 
Normal equations for best u 



Minimize H^m — 6p 
A^Au = A^fo 




©2006 Gilbert Strang 



©2006 Gilbert StrangCHAPTER 7. OPTIMIZATION AND MINIMUM PRINCIPLES 



as (column)^(e) = to find A^Au = A^b: 



(column l)""" 








'o" 


(column n)""" 




e 








IS 



A^e = 
A^{b -Au) = Q 
A^Au = A^b. 



(2) 



Graphically, Figure 7.1 shows Au as the projection of b. It is the combination of 
columns of A (the point in the column space) that is nearest to b. We studied least 
squares in Section 2.3, and now we notice that a second problem is solved at the 
same time. 

This second problem {dual problem) does not project b down onto the column 
space. Instead it projects b across onto the perpendicular space. In the 3D picture, 
that space is a line (its dimension is 3 — 2 = 1). In m dimensions that perpendicular 
subspace has dimension m — n. It contains the vectors that are perpendicular to all 
columns of A. The line in Figure 7.1 is the nullspace of . 

One of the vectors in that perpendicular space is e = projection of b ! Together, e 
and u solve the two linear equations that express exactly what the figure shows: 



Primal-Dual 
Saddle Point 
Kuhn-Tucker (KKT) 



A^ 



e + Au = b m equations 
e = n equations 



(3) 



We took this chance to write down three names for these very simple but so funda- 
mental equations. I can quickly say a few words about each name. 

Primal-Dual The primal problem is to minimize HAm — This produces u. The 
dual problem is to minimize \\w — 6p, under the condition that A"^w = 0. This 
produces e. We can't solve one problem without solving the other. They are solved 
together by equation (3), which finds the projections in both directions. 

Saddle Point The block matrix S in those equations is not positive definite ! 



Saddle point matrix 



/ A 

AT 



(4) 



The first m pivots are all I's, from the matrix /. When elimination puts zeros in 
place of A"^, it puts the negative definite —A'^A into the zero block. 



Multiply row 1 by A^ 
Subtract from row 2 



/ 


0" 




' / 


A 




7 


A 


-A^ 


/ 




AT 













(5) 



That elimination produced —A'^A in the (2,2) block (the "Schur complement"). So 
the final n pivots will all be negative. S is indefinite, with pivots of both signs. 

S doesn't produce a pure minimum or maximum, positive or negative definite. It 
leads to a saddle point {u, e). When we get up more courage, we will try to draw this. 
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Kuhn-Tucker These are the names most often associated with equations hke (3) 
that solve optimization problems. Because of an earlier Master's Thesis by Karush, 
you often see "KKT equations." In continuous problems, for functions instead of 
vectors, the right name would be "Euler-Lagrange equations." When the constraints 
include inequalities like w > or Bu = d, Lagrange multipliers are still the key. For 
those more delicate problems, Kuhn and Tucker earned their fame. 

NuUspace of A^ NuUspace of A^C 



e = b — Au 




onal 

fojection of b 

replacements U^--^ Column space of A = aMe^ekm&eAtB 

Figure 7.1: Ordinary and weighted least squares: min — A-up and — W^A-up. 




Weighted Least Squares 

This is a small but very important extension of the least squares problem. It involves 
the same rectangular A, and a square weighting matrix W. Instead of u we write 
(this best answer changes with W). You will see that the symmetric positive definite 
combination C = W'^W is what matters in the end. 



Weighted least squares Minimize HiyAu — WbW^ 

Normal equations for Uw {WA)^{WA)uw = {WA)^{Wb) 



(6) 



No new mathematics, just replace A and b by WA and Wb. The equation has become 

A^W"^WAuw = A^W"^Wb or A^^CAuw = A^Cb or A^Cib-Auw) = 0. (7) 

In the middle equation is that all-important matrix A'^CA. In the last equation, 
A^e = has changed to A^Ce = 0. When I made that change in Figure 7.1, 
I lost the 90° angles. The line is no longer perpendicular to the plane, and the 
projections are no longer orthogonal. We are still splitting b into two pieces, Auw 
in the column space and e in the nullspace of A^C. The equations now include this 
C = W^W: 

e is "C-orthogonal" e + Auw = b ,g- 

to the columns of A A^Ce =0 

With a simple change, the equations (8) become symmetric ! Introduce w = Ce 
and e = C~^w and shorten Uw to u: 
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Primal-Dual „ . _ a 

Saddle Point ^^^^ ^ I 

Kuhn- Tucker 

This weighted saddle point matrix replaces / by C^^ (still symmetric): 



Saddle point matrix 





-1 A 


m rows 


s = 


AT 


n rows 



Elimination produces m positive pivots from C ^, and n negative pivots from —A'^CA: 





A 




1(7 




'h 








w 




h 


AT 







U 







< > 


-A^CA 




u 




-A'^Cb 



(10) 



The "Schur complement" that appears in the 2, 2 block becomes —A^CA. We are 
back to A^CAu = A^Ch and all its applications. 

Two more steps will finish this overview of optimization. We show how a different 
vector / can appear on the right side. The bigger step is also taken by our second 
example, coming now. The dual problem (for w not u) has a constraint. At first it 
was A"^e = 0, now it is A'^w = 0, and in the example it will be A'^w = f. 

How do you minimize a function of e or w when these constraints 
are enforced? Lagrange showed the way, with his multipliers. 



Minimizing with Constraints 

The second example is a line of two springs and one mass. The function to minimize 
is the energy in the springs. The constraint is the balance A'^w = f between internal 
forces (in the springs) and the external force (on the mass). 1 believe you can see 
in Figure 7.2 the fundamental problem of constrained optimization. The forces are 
drawn as if both springs are stretched with forces / > 0, pulling on the mass. Actually 
spring 2 will be compressed {w2 is negative). 

As 1 write those words — spring, mass, energy, force balance — I am desperately 
hoping that you won't just say "this is not my area." Changing the example to 
another area of science or engineering or economics would be easy, the problem stays 
the same in all languages. 

All of calculus trains us to minimize functions: Set the derivative to zero! But 
the basis calculus course doesn't deal properly with constraints. We are minimizing 
an energy function E(wi, W2), but we are constrained to stay on the line Wi — W2 = f. 
What derivatives do we set to zerol 

A direct approach is to solve the constraint equation. Replace W2 by — / . That 
seems natural, but I want to advocate a different approach (which leads to the same 
result). Instead of looking for ly's that satisfy the constraint, the idea of Lagrange is 
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spring 1 >= 



force wi 



mass 



external force /T 

spring 2 



force W2 



Internal energy in the springs 

E{w) = Ei{wi) +^2(^2) 

Balance of internal/external forces 

u'l — W2 = f {the constraint) 

Constrained optimization problem 

Minimize E{w) subject to Wi — W2 = f 



Figure 7.2: Minimum spring energy E{w) subject to balance of forces. 



to build the constraints into the function. Rather than removing W2, we will add a 
new unknown u. It might seem surprising, but this second approach is better. 

With n constraints on m unknowns, Lagrange's method has m + n unknowns. The 
idea is to add a Lagrange multiplier for each constraint. (Books on optimization 
call this multiplier A or tt, we will call it u.) Our Lagrange function L has the 
constraint Wi — W2 — f = built in, and multiplied by —u: 



Lagrange function 


L{wi, W2, u) = Ei{wi) + E2{w2) - u{wi -W2- f) . 









Calculus can operate on L, by setting derivatives {three partial derivatives]) to zero: 

dL dEi 



Kuhn- Tucker 
optimality 
equations 



Lagrange 
multiplier u 



dwi 
dL 

dW2 

dL 

du 



dw 

dE2 
dw 







{wi) - u 
{W2) + u = 



-{Wi -W2- f) = 



(11a) 
(lib) 

flic) 



Notice how the third equation dL /du = automatically brings back the constraint — 
because it was just multiplied by —u. If we add the first two equations to eliminate u, 
and substitute Wi — f for W2, we are back to the direct approach with one unknown. 

But we don't want to eliminate u\ That Lagrange multiplier is an important 
number with a meaning of its own. In this problem, u is the displacement of the 
mass. In economics, u is the selling price to maximize profit. In all problems, u 
measures the sensitivity of the answer (the minimum energy -Emin) to a change in 
the constraint. We will see this sensitivity dEfy^\f^/df in the linear case. 
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Linear Case 



The force in a linear spring is proportional to the elongation e, by Hooke's Law 
w = ce. Each small stretching step requires work = (force) (movement) = (ce)(Ae). 
Then the integral ^ce^ that adds up those small steps gives the energy stored in the 
spring. We can express this energy in terms of e or w: 

1 Iw"^ 

Energy in a spring E{w) = -ce^ = . (12) 

Our problem is to minimize a quadratic energy E{w) subject to a linear balance 
equation wi — W2 = f ■ This is the model problem of optimization. 

Minimize E[w) = 1 subject to Wi — W2 = j ■ (13) 

2 Ci 2 C2 



We want to solve this model problem by geometry and then by algebra. 

Geometry In the plane of wi and W2, draw the line wi — W2 = f ■ Then draw the 
ellipse E{w) = -Emin that just touches this line. The line is tangent to the ellipse. A 
smaller ellipse from smaller forces Wi and W2 will not reach the line — those forces will 
not balance /. A larger ellipse will not give minimum energy. This ellipse touches 
the line at the point (^1,^2) that minimizes E{w). 




Figure 7.3: The ellipse E{w) = -Emin touches wi — W2 = f at the solution {wi,W2). 



At the touching point in Figure 7.3, the perpendiculars (1, —1) and {u, —u) to 
the line and the ellipse are parallel. The perpendicular to the line is the vector 
(1, —1) from the partial derivatives of wi — W2 — f ■ The perpendicular to the ellipse 
is {dE / dwi, dE / dw2) , from the gradient of E{w). By the optimality equations (11a) 
and (lib), this is exactly (m, — m). Those parallel gradients at the solution are the 
algebraic statement that the line is tangent to the ellipse. 
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Algebra To find (^1,^2), start with the derivatives of wf/2ci and w|/2c2: 

„ 1. dE wi , dE W2 , ,s 

Energy gradient 7- — = — and — — = — . (14) 

OWi Ci dW2 C2 

Equations (11a) and (lib) in Lagrange's method become Wi/ci = u and w^jc^ = —u. 
Now the constraint Wi — W2 = f yields (ci + C2)u = f {both w's are eliminated): 

Substitute Wi = Ciu and W2 = —C2U. Then (ci + C2)u = f . (15) 

I don't know if you recognize Ci + C2 as our stiffness matrix A'^CA ! This problem 
is so small that you could easily miss K = A'^CA. The matrix A^ in the constraint 
equation A^w = wi — W2 = f is only 1 by 2, so the stiffness matrix K is 1 by 1: 



A^ = [1 -1] and K = J^CA = [l -l] 



Cl 




1" 


C2_ 




-1 



[C1 + C2]. (16) 



The algebra of Lagrange's method has recovered Ku = f. Its solution is the movement 
u = f / (C1+C2) of the mass. Equation (15) eliminated wi and W2 using (11a) and (lib). 
Now back substitution finds those energy-minimizing forces: 

Spring forces wi = ciu = — — — and W2 = —C2U = . (17) 

Cl + C2 Cl + C2 

Those forces {wi, W2) are on the ellipse of minimum energy -Emim tangent to the line: 
E( I ^1/' I 1 ^^f" -1 -E 

^"^^ 2 Cl 2 C2 2 (Ci + C2Y 2 (Ci + C2)2 2 Cl + C2 



min 



This -Emin must be the same minimum value \f^K ^/ as in Section It is. 

We can directly verify the mysterious fact that u measures the sensitivity of -Emin 
to a small change in /. Compute the derivative dEmin/df: 

/I /2 \ f 

Lagrange multiplier = Sensitivity — I = = u. (19) 

df \2ci + C2J C1 + C2 

This sensitivity is linked to the observation in Figure 7.3 that one gradient is u times 
the other gradient. From (11a) and (lib), that stays true for nonlinear springs. 



A Specific Example 

I want to insert ci = C2 = 1 in this model problem, to see the saddle point of L more 
clearly. The Lagrange function with built-in constraint depends on Wi and W2 and u: 

L = —wl H — W2 — uwi + UW2 + uf . (20) 
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The equations dL/dwi = and dL/dw2 = and dL/du = produce a beautiful 
symmetric saddle-point matrix S: 



(21) 



Is this matrix S positive definite? No. It is invertible, and its pivots are 1,1,-2. 
That —2 destroys positive definiteness — it means a saddle point: 



dL/dwi = 


Wi — u 


= 




1 





-1 " 




Wi 




'0 


dL/dw2 = 


W2+U 


= 


or 





1 


1 




W2 







dL/du = 


-Wi + W2 


= / 




-1 


1 







u 




./ 





1 





-1 




1 


-1 




1 


Elimination 





1 


1 




1 


1 


with L = 


1 




-1 


1 









-2 




-111 



On a symmetric matrix, elimination equals "completing the square." The pivots 
1, 1, — 2 are outside the squares. The entries of L are inside the squares: 

+ \ol - UWx + = ^ \V{w^ - uf + \{W2 + uf - 1{uf\ . (22) 

The first squares {w\ — uf' and {w2 + uf go "upwards," but —Iv? goes down. This 
gives a saddle point SP = {wx^w-^^u) in Figure 7.4. 

The eigenvalues of a symmetric matrix have the same signs as the pivots, and the 
same product (which is det S = —2). Here the eigenvalues are A = 1, 2, —1. 



L{wi,W2,u) 




L = \ \{wx - uf + (^2 + uf - 2^2] + uf 
Four dimensions make it a squeeze 
Saddle point SP = {w\,W2-,u) = ^2/, /) 



Cl + C2 



Figure 7.4: {wi—uf and {w2+uf go up, —2u'^ goes down from the saddle point SP. 



The Fundamental Problem 

May I describe the full linear case with w = {wi, . . . , Wm) and A^w = (/i, . . . , /„) ? 
The problem is to minimize the total energy E{w) = ^w'^C'^w in the m springs. The 
n constraints A^w = f are built in by Lagrange multipliers Ui, . . . ,Un- Multiplying 
the force balance on the kth mass by —Uk and adding, all n constraints are built into 
the dot product u'^lA'^w — /). For mechanics, we use a minus sign in L: 

I 

Lagrange function L(w,u) = -w'^C^^w — u^{A^w — f) . (23) 
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To find the minimizing w, set the m + n first partial derivatives of L to zero: 

Kuhn-Tucker dL/dw = C~'w - Au = (24a) 

optimality 

equations dL/du = -A^w + / = (24b) 

This is the main point, that Lagrange multipliers lead exactly to the hnear equations 
w = CAu and A^w = f that we studied in the first chapters of the book. By using 
—u in the Lagrange function L and introducing e = Au, we have the plus signs that 
appeared for springs and masses: 

e = Au w = Ce f = A^w =^ A^CAu = f. 



Important Least squares problems have e = b — Au (minus sign from voltage 
drops). Then we change to -\-u in L. The energy E = ^w'^C~^w — h^w now has 
a term involving h. When Lagrange sets derivatives of L to zero, he finds S ! 



dL/dw = C~^w + Au-b 
dL/du = A'^w - f 








or 



A 




w 
u 



(25) 



This system is my top candidate for the fundamental problem of scientific computing. 

You could eliminate w = C{b — Au) but I don't know if you should. If you do it, 
K = A'^CA will appear. Usually this is a good plan: 

Remove w A^w = A^C{b - Au) = f which is A^CAu = A^Cb - f . (26) 



Duality and Saddle Points 

We minimize the energy E{w) but we do not minimize Lagrange's function L{w,u). 
It is true that dL/dw and dL/du are zero at the solution. But the matrix of second 
derivatives of L is not positive definite. The solution w,u is a saddle point. It is 
a minimum of L over w, and at the same time it is a maximum of L over u. A saddle 
point has something to do with a horse. . . It is like the lowest point in a mountain 
range, which is also the highest point as you go across. 

The minimax theorem states that we can minimize first (over w) or maximize 
first (over u). Either order leads to the unique saddle point, given by dL/dw = 
and dL/du = 0. The minimization removes w from the problem, and it corresponds 
exactly to eliminating w from the equations (11) for "first derivatives = 0." Every 
step will be illustrated by examples (linear case first). 

Allow me to use the A, C, A"^ notation that we already know, so I can point out 
the fantastic idea of duality. The Lagrangian is L{w, u) = ^w"^ C~^w — u"^ {A^^ w — f) , 
leaving out b for simplicity. We compare minimization first and maximization first: 
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Minimize over w dL/dw = when C — Au = 0. Then w = CAu and 

L = —^{Au)^C{Au) + f. This is to be maximized. 

Maximize over u Key point: Lmax = +oo if A^w ^ f. Minimizing over w 

will keep A'^w = f to avoid that +oo. Then L = ^w^C~^w. 

The maximum of a linear function like 5u is +oo. But the maximum of Ou is 0. 

Now come the two second steps, maximize over u and minimize over w. Those 
are the dual problems. Often one problem is called the "primal" and the other 
is its "dual" — but they are reversible. Here are the two dual principles for linear 
springs: 

1. Choose u to maximize —^{Au)'^C{Au) + u^f. Call this function —P{u). 

2. Choose w to minimize |w^C~^tu keeping the force balance A^w = f. 

Our original problem was 2. In the language of mechanics, we were minimizing the 
complementary energy E{w). Its dual problem is 1. This minimizes the potential 
energy P{u), by maximizing —P{u). Most finite element systems choose that "dis- 
placement method." They work with u because it avoids the constraint A'^w = f. 

The dual problems 1 and 2 involve the same inputs A, C, f but they look entirely 
different. Equality between minimax{L) and maximin{L) gives the duality principle 
and the saddle point: 

Duality of 1 and 2 max(-P(M))= min E{w) . (27) 

all u A^w ^ / 

That is the big theorem of optimization — the maximum of one problem equals 
the minimum of its dual. We can find those numbers, and see that they are 
equal, because the derivatives are linear. Maximizing —P{u) will minimize P{u), 
which is the problem we solved in Chapter 1. Write u* and w* for the minimizer and 
maximizer: 

1. P{u) = \u^A^CAu - u^f is P^in = -i/T(ATCA)-7 when u* = {A'^CAy^f 

2. E{w) = ^w^C-^w is E^in = y^iA'^CAy^f when w* = CAu*. 

So u* = K^^f and w* = CAu* give the saddle point {w*,u*) of L. This is where 
E ■ = -P 



min* 



Problem Set 7.1 



1 Our model matrix M in (21) has eigenvalues Ai = 1, A2 = 2, A3 = — 1: 

1 -1" 

M 



A 




1 1 
-1 1 



The trace 1 + 1 + down the diagonal of M equals the sum of A's. Check that 
detM = product of A's. Find eigenvectors Xi,X2,X3 of unit length for those 
eigenvalues 1,2, —1. The eigenvectors are orthogonal! 
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2 The quadratic part of the Lagrangian L{wi,W2,u) comes directly from M: 





1 





-1" 




Wi 







1 


1 




W2 




-1 


1 







u 



{wi + wi — 2uwi + 2UW2) ■ 



Put the unit eigenvectors xi,X2,xs inside the squares and A = 1,2, —1 outside: 



wl + wl — 2uwi + 2uw2 



The first parentheses contain (Iwi — 1^2 + 0w3)/v^ because Xi is (1, — 1, 0)/a/2. 
Compared with (22), these squares come from orthogonal eigenvector directions. 
We are using A = QAQ^ instead of A = LDL^. 

3 Weak duality Half of the duality theorem is max— P('u) < min E{w). This is 
surprisingly easy to prove. Show that —P{u) is always smaller than E{w), for 
every u and w with A'^w = f. 

1 rp rp rp X rp -i rp 

— -M A CAu + u f < —w C~ w whenever A w = f . 

Set / = A^w. Verify that (right side) - (left side) = l{w - CAu)^C-\w - 
CAu) > 0. 

Equality holds and max(— P) = min(£') when w = CAu. That is equa- 
tion (11)! 

4 Suppose the lower spring in Figure 7.2 is not fixed at the bottom. A mass at 
that end adds a new force balance constraint W2 — /2 = 0. Build the old and 
new constraints into the Lagrange function L{wi,W2,Ui,U2) to minimize the 
energy Ei{wi) +-^2(^2)- Write down four equations like (lla)-(llc): partial 
derivatives of L are zero. 

5 For spring energies Ei = ^wf/ci and E2 = |w|/c2, find A in the block form 



A 

A^ 



with w 



Wi 
W2 



,u 



Ui 
U2 



f 



fl 

/2 



Elimination subtracts A'^C times the first block row from the second. With 
Ci = C2 = 1, what matrix —A'^CA enters the zero block? Solve for u = {ui,U2). 

6 Continuing Problem 5 with C = I, write down w = CAu and compute the 



energy E^ 



'^wf + \w2. Verify that its derivatives with respect to /i and /2 



are the Lagrange multipliers Ui and U2 (sensitivity analysis). 
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Solutions 7.1 



4. The Lagrange function is L = Ei{wi) + E2{w2) —ui{wi —W2 — fi] 
Its first partial derivatives at the saddle point are 







'U2{w2-f2)- 



dL 


dEi. 


dwi 


= ( 

dw 


dL 


dE2, 


dW2 


= ( 

dw 


dL 


= -{wi 


dui 


dL 


= -{W2 


dU2 



{w2-f2)=0. 



5. 









-1 " 


C-i A 






1 -1 







-1 1 
. -1 





With C = I elimination leads to 
-A^A 



A^A 



2 -1 
-1 1 



The equation... 



